25 research outputs found

    The self-organizing map as a visual neighbor retrieval method

    Get PDF
    We have recently introduced rigorous goodness criteria for information visualization by posing it as a visual neighbor retrieval problem, where the task is to find proximate high-dimensional data based only on a low-dimensional display. Standard information retrieval criteria such as precision and recall can then be used for information visualization. We introduced an algorithm, Neighbor Retrieval Visualizer (NeRV), to optimize the total cost of retrieval errors. NeRV was shown to outperform alternative methods, but the SOM was not included in the comparison. In empirical experiments of this paper the SOM turns out to be comparable to the best methods in terms of (smoothed) precision but not on recall. On a related measure called trustworthiness, the SOM outperforms all others. Finally, we suggest that for information visualization tasks the free parameters of the SOM could be optimized for information visualization with cross-validation

    The self-organizing map as a visual neighbor retrieval method

    Get PDF
    We have recently introduced rigorous goodness criteria for information visualization by posing it as a visual neighbor retrieval problem, where the task is to find proximate high-dimensional data based only on a low-dimensional display. Standard information retrieval criteria such as precision and recall can then be used for information visualization. We introduced an algorithm, Neighbor Retrieval Visualizer (NeRV), to optimize the total cost of retrieval errors. NeRV was shown to outperform alternative methods, but the SOM was not included in the comparison. In empirical experiments of this paper the SOM turns out to be comparable to the best methods in terms of (smoothed) precision but not on recall. On a related measure called trustworthiness, the SOM outperforms all others. Finally, we suggest that for information visualization tasks the free parameters of the SOM could be optimized for information visualization with cross-validation

    Trustworthiness and metrics in visualizing similarity of gene expression

    Get PDF
    BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it

    Dimensionalisuuden pienentäminen samankaltaisuuksien visuaalista tarkastelua varten

    No full text
    Visualizations of similarity relationships between data points are commonly used in exploratory data analysis to gain insight on new data sets. Answers are searched for questions like: Does the data consist of separate groups of points? What is the relationship of the previously known interesting data points to other data points? Which points are similar to the points known to be of interest? Visualizations can be used both to amplify the cognition of the analyst and to help in communicating interesting similarity structures found in the data to other people. One of the main problems faced in information visualization is that while the data is typically very high-dimensional, the display is limited to only two or at most three dimensions. Thus, for visualization, the dimensionality of the data has to be reduced. In general, it is not possible to preserve all pairwise relationships between data points in the dimensionality reduction process. This has lead to the development of a large number of dimensionality reduction methods that focus on preserving different aspects of the data. Most of these methods were not developed to be visualization methods, which makes it hard to assess their suitability for the task of visualizing similarity structures. This problem is made more severe by the lack of suitable quality measures in the information visualization field. In this thesis a new visualization task, visual neighbor retrieval, is introduced. It formulates information visualization as an information retrieval task. To assess the performance of dimensionality reduction methods in this task two pairs of new quality measures are introduced and the performance of several dimensionality reduction methods are analyzed. Based on the insight gained on the existing methods, three new dimensionality reduction methods (NeRV, fNeRV and LocalMDS) aimed for the visual neighbor retrieval task, are introduced. All three new methods outperform other methods in numerical experiments; they vary in their speed and accuracy. A new color coding scheme, similarity-based color coding, is introduced in this thesis for visualization of similarity structures, and the applicability of the new methods in the task of creating graph layouts is studied. Finally, new approaches to visually studying the results and convergence of Markov Chain Monte Carlo methods are introduced.Samankaltaisuussuhteiden visualisointia käytetään eksploratiivisessa data-analyysissä usein ensimmäisenä askeleena uuden datajoukon tarkastelussa. Tavoitteena on muodostaa alustava käsitys datan rakenteesta ja tuottaa vastaus kysymyksiin kuten: Jakautuuko data erillisiin ryhmiin? Mikä on aiemmin havaittujen kiinnostavien datapisteiden suhde uusiin tuntemattomiin datapisteisiin? Mitkä pisteet ovat samankaltaisia kuin kiinnostaviksi tiedetyt pisteet? Visualisointi voi sekä helpottaa datan analyysiä että auttaa havaittujen rakenteiden kommunikoinnissa. Informaation visualisoinnissa data on tyypillisesti korkeaulotteista. Tämä on ongelmallista, koska näytöllä ei pystytä esittämään kuin korkeintaan kolme dimensiota kerrallaan. Tästä syystä datan dimensionaalisuus on saatava pudotettua kahteen tai kolmeen visualisointia varten. Dimension pienentämisestä seuraa lähes aina jonkinlaisia virheitä; ei ole mahdollista säilyttää kaikkia datassa esiintyviä samankaltaisuusuhteita ennallaan vaan informaatiota katoaa ja vääristyy. Eri dimensionaalisuuden pienennysmenetelmät pyrkivätkin säilyttämään datan eri ominaisuuksia. Ongelmana informaation visualisoinnissa on, että suurinta osa dimensionaalisuuden pienennysmenetelmistä ei ole kehitetty visualisointia varten, minkä vuoksi niillä tuotettujen kuvien laatu on varmistettava. Vaikeaksi laadun varmistamisen tekee sopivien mittareiden puute. Tässä väitöskirjassa esitetään uusi visualisointitehtävä, visuaalinen naapureiden haku. Siinä informaation visualisointi hahmotetaan informaation hakutehtäväksi. Tätä formulaatiota käytetään muodostamaan kaksi paria uusia visualisoinnin laatumittareita, ja useiden dimensionaalisuuden pienennysmenetelmien soveltuvuutta tähän uuteen visualisointitehtävään tutkitaan. Tulosten pohjalta saatuja ideoita käytetään kolmen uuden dimensionaalisuuden pienennysmenetelmän luomiseen (NeRV, fNeRV ja LocalMDS). Kaikki kolme menetelmää ovat päihittäneet muut menetelmät numeerisissa kokeissa. Toisistaan ne eroavat nopeudessa ja tarkkuudessa. Lisäksi tässä työssä esitetään uusi menetelmä värien allokoimiseksi datapisteille, samankaltaisuuksiin perustuvä värikoodaus, ja uusien menetelmien soveltuvuutta graafien visualisointiin testataan. Lopuksi tarkastellaan Markov-ketju Monte Carlo menetelmien konvergenssin ja tulosten visualisointia.reviewe
    corecore